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Abstract 



In the kernel clustering problem we are given a (large) n x « symmetric positive semidefinite matrix 
A - (aij) with YJi^i c^ij = ^"d a (small) kxk symmetric positive semidefinite matrix B - {bij). The 

goal is to find a partition {5 1 , . . . , 5^) of { 1 , . . . «) which maximizes Xif=i E /=i (E(p,c/)es,xSj '^p?) ^ij- We 

design a polynomial time approximation algorithm that achieves an approximation ratio of where 
R{B) and C{B) are geometric parameters that depend only on the matrix B, defined as follows: if bij — 
(v,, Vj) is the Gram matrix representation of B for some v\, . . . ,\'k e then R{B) is the minimum radius 
of a Euclidean ball containing the points {vi , . . . , v^^). The parameter C{B) is defined as the maximum over 
all measurable partitions {A\, . . . ,Ak] of R*"' of the quantity Y!i=\ Zi j=i bij{zi, Zj), where for / € { 1 , . . . , A;) 
the vector Zi e R*^"' is the Gaussian moment of A,-, i.e., Zi - (2;r)»"-'»- Ja -^^ " "^^ We also show that 
for every e > 0, achieving an approximation guarantee of (1 - e)^^ is Unique Games hai^d. 

1 Introduction 

Kernel Clustering [13 ] is a combinatorial optimization problem which originates in the theory of machine 
learning. It is a general framework for clustering massive statistical data so as to uncover a certain hypothe- 
sized structure. The problem is defined as follows: let A - (a,y) be an « x « symmetric positive semidefinite 
matrix which is usually normalized to be centered, i.e., Z"=i ^ij - 0- The matrix A is often thought of 
as the correlation matrix of random variables (Xi , . . . , X„) that measure attributes of certain empirical data, 
i.e., Uij - E [XjXyj. We are also given another symmetric positive semidefinite ^ x ^ matrix B - (bij) which 
functions as a hypothesis, or test matrix. Think of n as huge and k as small. The goal is to cluster A so 
as to obtain a smaller matrix which most resembles B. Formally, we wish to find a partition {5 1, . . . , 5^:) 
of {1, . . . , n} so that if we write c,y := 'Ej(p,q)es ixs j ^pq^ we form a ^ x ^ matrix C = (c,y) by clustering 
A according to the given partition, then the resulting clustered version of A has the maximum correlation 
Zf=i Zj=i (^ij^ij with the hypothesis matrix B. Equivalently, the goal is to evaluate the number: 

k k 

Clust(A|B):- max , V a,- Aomj) ■ (1) 

o-:|l,...,fi|->{l,...,A:l ^ ^ 

The strength of this generic clustering framework is based in part on the flexibility of adapting the 
matrix B to the problem at hand. Various particular choices of B lead to well studied optimization problems, 
while other specialized choices of B are based on statistical hypotheses which have been applied with some 
empirical success. We refer to lfT3H 71 for additional background and a discussion of specific examples. 

In Q we investigated the computational complexity of the kernel clustering problem. Answering a 
question posed in lfT3l . we showed that this problem has a constant factor polynomial time approximation 
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algorithm. We refer to (7) for more information on tiie best known approximation guarantees. We also 
obtained hardness results for kernel clustering under various complexity assumptions. For example, we 
showed in |7 | that when B = I^is the 3x3 identity matrix then a approximation guarantee for Clust(A|/3) 
is achievable, while any approximation guarantee smaller than is Unique Games hard. We will discuss 
the Unique Games Conjecture (UGC) presently. At this point it suffices to say that the above statement is 
evidence that the hardness threshold of the problem of approximating Clust(A|/3) is or more modestly 
that obtaining a polynomial time algorithm which approximates Clust(A|/3) up to a factor smaller than 
would require a major breakthrough. 

Another result proved in |i7 ] is that when ^ > 3 and B is either the kx k identity matrix or is spherical 
(i.e., bii = 1 for all / € {1, . . . ,k}) and centered (i.e., ^'j ~ 0) ^^^^ there is a polynomial time 

approximation algorithm which, given A, approximates Clust(A|B) to within a factor of ^ (l - |)- We 
also presented in |7| a conjecture (called the Propeller Conjecture) which we proved would imply that 
T ~ i) the UGC hardness threshold when B = 4. We refer to \J] for more information on the Propeller 
Conjecture, which at present remains open. 

The above quoted result from [7 ] settles the problem of evaluating the UGC hardness threshold of the 
following type of algorithmic task: given A and an hypothesis matrix B which is guaranteed to belong to a 
certain class of matrices (in our case centered and spherical), approximate efficiently the number Clust(A|B). 
Naturally this can be refined to a family of optimization problems which depend on a fixed B: for each B, 
what is the UGC hardness threshold of the problem of, given A, approximating Clust(A|B)? In 171 we 
answered this question only when B = Ii,, and for B - 1^ assuming the Propeller Conjecture, and asked 
about the case of general B (we did give some B-dependent bounds in |7 J, but they were not sharp for B h 
for reasons that will become clear presently). This is a natural question since it makes sense to use the best 
possible polynomial time algorithm if we know B in advance. 

Here we answer the above question in full generality. To explain our results we need to define two 
geometric parameters which are associated to B. Since B is symmetric and positive semidefinite we can find 
vectors vi , . . . , Vjt e 1.*^ such that B is their Gram matrix, i.e., bjj = (v,, vj) for all /, 7 € { 1, . . . , k]. Let R{B) be 
the smallest possible radius of a Euclidean ball in K.*^ which contains {vi, . . . , vt) and let w(B) be the center 
of this ball. Let C{B) be the maximum over all partitions |Ai, . . . , A^) of Mf^~^ into measurable sets of the 
quantity ^^^j ^^^=1 bijizi, Zj), where for / € {\,. . . ,k\ the vector Zi € R*^-' is the Gaussian moment of A, , i.e., 

Zi = (2n-#-')/2 Ja xe'^^^^^^^^dx (this maximum exists, as shown in Section|2]). Our main result is the following 
theorenu: 

Theorem 1.1. For every symmetric positive semidefinite kxk matrix B there exists a randomized polynomial 
time algorithm which given an n X n symmetric positive semidefinite centered matrix A, outputs a number 
Alg(A) such that 

Clust(AIB) < E[Alg(A)] < ^^Clust(AIB). 

C(«) 

On the other hand, assuming the Unique Games Conjecture, no polynomial time algorithm approximates 
Clust(A|B) to within a factor strictly smaller than ^^^^ . 

As an example of Theorem 1 1.1 1 for a particular hypothesis matrix consider the following perturbation of 



'We refer to the discussion in Question 1 in Section fTTTI below wiiich addresses thie issue of computing efficiently good approx- 
imate clusterings rather than approximating only the value Clust(A|S). 
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the previously studied case B - ly. 

(\ 



Be 



1 
0c 



where c > is a parameter. The problem of approximating efficiently Clust(A|Bc) corresponds to parti- 
tioning the rows of A into 3 sets S 1,82, Si, Q {\, . . . ,n] and maximizing the sum of the total masses of A 
on S\ X S\,S2 X 82,83 X S3, where the parameter c can be used to tune the weight of the set S3. This 
problem is not particularly important — we chose it just as a concrete example for the sake of illustration. 
In Section |6] we compute the parameters R{Bc),C{Bc) and deduce that the UGC hardness threshold of the 
problem of computing Clust(A|Bc) equals ^'^i'^2c^ ii c > j and equals '^2+4^ if c < 5. The change ate = ^ 
corresponds in a qualitative change in the best algorithm for computing Clust(A|Bc) — we refer to Section [6] 
for an explanation. 

In the remainder of this introduction we will explain the various ingredients of Theorem ll.IK in particular 
the Unique Games Conjecture), and the new ideas used in its proof. 

The main tool in the design of the algorithm in Theorem 1 1.1 1 is a natural generalization of the positive 
semidefinite Grothendieck inequality. In [4J Grothendieck proved that there exists a universal constant 
K > such that for every nxn symmetric positive semidefinite matrix A - (ajj) we have|l: 

n n n n 

,=1 ;=1 i=\ ]=\ 

The best constant A" in ([2ll was shown in lITTI to be equal to |. A natural variant of Q is to replace the 
numbers -1, 1 by general vi, . . . , v^. € K.*^, namely one might ask for the smallest constant > such that 
for every symmetric positive semidefinite nxn matrix A we have: 



V V Uijixi, Xj) < K max V V a,y(M,-, Uj). (3) 

-r-f' ^ M|,...,H„e{vi,...,i>| ■e-' ^ 

1=1 ;=1 (=1 ]=l 



In Section [3] we prove that ^ holds with K - where B = ((v,, v^)) is the Gram matrix of vi, . . . , v^, 
and that this constant is sharp. This inequality is proved along the following lines. Fix n unit vectors 
xi,. . . ,x„ € S"~^. Let G = (gij) be a - 1) X « random matrix whose entries are i.i.d. standard Gaussian 
random variables. Let Ai, . . . , A^; c ]R^"i be a measurable partition of K.*^"' at which C(B) is attained. Define 
a random choice of m, € {vi , . . . , v^-} by setting = for the unique £ € [I,. . .,k} such that Gx,- e A[. The 
fact that ^ holds with K = is a consequence of the following fact, which we prove in Section [3l 



2j Zj «;> 

'•=1 y=i 



i=\ j=\ 



The crucial point in the proof of Q is the following identity, proved in Lemma 13.21 as a corollary of the 
closed-form formula for the Poison kernel of the Hermite polynomials: for every two measurable subsets 



^This inequality is sometimes written as max,., ,,,g_s„-i 2I'=i 2j'=i '^iji^i,}']) ^ ^rnax£, a,e|_i i| Yi'Ux 2;'=i i^ij^iSj, but it is easy (and 
standard) to verify tliat since A is positive semidefinite this formulation coincides with (O. 
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E,F cM.^ 1 and any two unit vectors x,y eW, we have 
Pr [Gx € E and Gy e F] 

= 7k-i{E)n-iiF) + (x,y)lfudn^i{u),fudyk^i{u)\ + Y,{^^^^y^^) J] a,(E)a,{F), (5) 

' ^=2 ie(NU{0|)*-' 

for some real coefficients {Q'.v(£^)).sg(Nu{0))*-i > l^.v(^))se(Nu{0))*-i £ Here yi;_i denotes the standard Gaussian 
measure on R^~^ . The product structure of the decomposition hints at the role of the fact that A is positive 
semidefinite in the proof of ^ — the complete details appear in Section [3] 

Once the generalized Grothendieck inequality (fTSl) is obtained with K = it is simple to design the 
algorithm whose existence is claimed in Theorem [TTTJ which is based on semidefinite programming — this 
is done in Section ID 

We shall now pass to an explanation of the hardness result in Theorem 11.11 The Unique Games Con- 
jecture, posed by Khot in f6l, is as follows. A Unique Game is an optimization problem with an instance 
^ - ^{G{V, W, E), n, {nyw](v,w)eE)- Here G{V, W, E) is a regular bipartite graph with vertex sets V and W 
and edge set E. Each vertex is supposed to receive a label from the set {1, ... , n). For every edge (v, w) e E 
with V € y and w eW, there is a given permutation n^^ : { 1, . . . , n) { 1, . . . , «). A labeling of the Unique 
Game instance is an assignment p : V UW {1, ...,«). An edge (v, w) is satisfied by a labeling p if and 
only if p(v) = Tr,,,, (pCw)). The goal is to find a labeling that maximizes the fraction of edges satisfied (call this 
maximum OPT(^)). We think of the number of labels n as a constant and the size of the graph G{V, W, E) 
as the size of the problem instance. The Unique Games Conjecture (UGC) asserts that for arbitrarily small 
constants e,6 > Q, there exists a constant n - n{E, 6) such that no polynomial time algorithm can distinguish 
whether a Unique Games instance ^ = ^{G{V, W, E), n, ItTw, )(v,w)eiv) satisfies OPT(^) < 6 (soundness) 
or there exists a labeling such that for 1 - £ fraction of the vertices v € V all the edges incident with v are 
satisfied (completeness jj. This conjecture is (by now) a commonly used complexity assumption to prove 
hardness of approximation results. Despite several recent attempts to get better polynomial time approxima- 
tion algorithms for the Unique Game problem (see the table in |3| for a description of known results), the 
unique games conjecture still stands. 

Our UGC hardness result follows the standard "dictatorship test" approach which is prevalent in PCP 
based hardness proofs, with a new twist which seems to be of independent interest. Since the kernel clus- 
tering problem is concerned with an assignment of one of k labels to each of the rows of the matrix A, 
the natural setting of our hardness proof is a dictatorship test for functions on {I, . . .,k}" taking values in 
{1, . . . , ^) (this was already the case in [7]). The general "philosophy" of such hardness proofs is to associate 
to every such function a certain numerical parameter called the "objective value" (which is adapted to the 
optimization problem at hand). The general scheme is to show that for some numbers a,b > 0,if f depends 
on only one coordinate (i.e., it is a "dictatorship") then the objective value of / is at least a, while if / does 
not have any coordinate which is too influential then the objective value of / is at most b + o{l) (the o(l) 
depends on the notion of having no influential coordinates and its exact form is not important for the purpose 
of this overview — we refer to Section[5]for details). Once such a result is proved, techniques from the theory 
of Probabilistically Checkable Proofs can show that under a suitable complexity theoretic assumption (in 
our case the UGC) no polynomial time algorithm can achieve an approximation factor smaller than |. 

''This version of the UGC is not the standard version as stated in |6|, which only requires OPT(^) > 1 - e in the completeness. 
However, it was shown in 1 8 1 that this seemingly stronger version of the UGC actually follows from the original UGC — we will 
require this stronger statement in our proofs. 
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Implicit to the above discussion is an underlying product distribution on { 1 , . . . , with respect to which 
we measure the influence of variables. In fV] the case of B = 4 was solved using the uniform distribution 
on |1, . . . ,^). It turns out that in order to prove the sharp hardness result in Theorem II. II we need to use 
a non-uniform distribution which depends on the geometry of B. Namely, writing B as a Gram matrix 
bij = {vi, Vj), recall that R{B) is the radius of the smallest Euclidean ball containing {vi, . . . , v^} and w{B) is 
the center of this ball. A simple separation argument shows that w{B) is in the convex hull of the vectors in 
{vi , . . . , Vi:) whose distance from w{B) is exactly R{B). Writing w{B) as a convex combination of these points 
and considering the coefficients of this convex combination results in a probability distribution on { 1, . . . , 
In our hardness proof we use the n-fold product of (a small perturbation of) this probability distribution as 
the underlying distribution on |1, ...,/:) for our dictatorship test — see Figure 1 for a schematic description 
of the situation described above. The full details of this approach, including all the relevant definitions, are 
presented in Section [5] 




Figure 1 : The geometry of the test matrix B induces a dictatorship test: the points above are the vectors 
{vi, . . . , Vyt) c K.*^ such that B is their Gram matrix. The ball depicted above is the smallest Euclidean ball 
containing {vi, . . . , v^j, R{B) is its radius and w{B) is its center Then w{B) is in the convex hull of the 
points in {vi, . . . , Vyt) which are at distance exactly R{B)from w{B). Writing w{B) as a convex combination 
of these boundary points yields a distribution over the labels {\,. . .,k}. Our dictatorship test corresponds 
to selecting a point from the n-fold power of this probability space and comparing the behavior of a certain 
"objective value " ( defined in equation (1311 ) below ), which depends only on the singleton Fourier coefficients, 
for dictatorships and for functions with low infiuences. 



1.1 Open problems 

We end this introduction with a statement of some open problems. 

Question 1. Theorem 11.11 shows that the UGC hardness threshold of the problem of computing Clust(A|B) 
for a fixed hypothesis matrix B equals It is natural to ask if there is also a polynomial time algorithm 

which outputs a clustering of A whose value is within a factor of of the optimal clustering. The issue 
is that our rounding algorithm uses the partition {A\, ... ,Aic] of R*^"' at which C{B) is attained. In Section |2] 
we study this optimal partition, and show that it has a relatively simple structure rather than being composed 
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of general measurable sets: it corresponds to cones which are induced by the faces of a simplex. This 
information allows us to compute efficiently a partition which comes as close as we wish to the optimal 
partition when k is fixed, or grows slowly with n (to be safe lets just say for the sake of argument that 
k K log log n works). We refer to Remark 1231 for details. We currently do not know if there is polynomial 
time rounding algorithm when, say, k « V"- Given e > 0, is there an algorithm which, given A and B, 
computes Clust(A|B) to within a factor of (1 + e)^^, and runs in time which is polynomial in both n and 
k (and maybe even 1/e)? 

Question 2. We remind the reader that the Propeller Conjecture remains open. This conjecture is about the 
value of C(/yt) when ^ > 4. It states that the partition at which C(4) is attained is actually much simpler than 
what one might initially expect: only 3 of the sets have positive measure and they form a cylinder over a 
planar 120° "propeller". We refer to Q for a precise formulation and some evidence for the validity of the 
Propeller Conjecture. 

Question 3. The kernel clustering problem was stated in |[T3l for matrices A which are centered. This 
makes sense from the perspective of machine learning, but it seems meaningful to also ask for the UGC 
hardness threshold of the same problem when A is not assumed to be centered. In the present paper we did 
not investigate this case at all, and it seems that the exact UGC hardness threshold when A is not necessarily 
centered is not known for any interesting hypothesis matrix B. Note that in |7] we showed that there is a 
constant factor polynomial time approximation algorithm when A is not necessarily centered: we obtained 
in [71 an approximation guarantee of 1 + in this case, but this is probably suboptimal. 



2 Preliminaries on the parameter C{B) 

Let B = {bij)'^^j^^ € M<:(R) be a ^ x ^ symmetric positive semidefinite matrix. In what follows we fix A; > 2 
and the matrix B. We also fix vectors vi , . . . , v^t € R'^ for which bij = (v,-, vj) for all i, j € {I, . . . , k}. 

Let y„ denote the standard Gaussian measure on R", i.e., the density of y,, is \„/2 ^~^^^^^^^^^- We denote by 
Hk the Hilbert space L2(y„) © L2(y„) ® • • • © Z.2(y,j) {k times) and we consider the convex subset A/t(y„) c Hk 
give by: 

A;t(y„):=|(/i,...,A)e//^: V; e {1, . . . /y > A Z^^'" ^j" 

Define: 

C{n,B):= sup V, ^'7 • ( I xfiix)dy„{x), \ xfj{x)dynix)) . (7) 
(/i,...,/t)eAt(r„)'^^ \Jr" Jm." I 

The following lemma is a variant of Lemma 3. 1 in Q (but see Remark lZTl for an explanation of a subtle 
diff"erence). It simply states that the supremum in ([7]) is attained at a ^-tuple of functions which correspond 
to a partition of R". 

Lemma 2.1. There exist disjoint measurable sets A\,...,Ak c R" such that Ai U A2 U • • • U A^; = R" and 

k k 



^ ^ ' ( J" ^djnix), J xdynix)^ = C{n, B). 
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Proof. Define »F : Ajt(y„) ^ R by 



^(/i , ■ ■ ■ , fk) ■=yy bij ■ I f xft{x)dyn{x), f xfj(x)dynix)) . (8) 

We first observe that *F is a convex function. Indeed, fix /I e [0, 1] and (/i, . . . ,fk), (gi, ■ ■ ■ ,gk) £ ^k(7n)- 
Denote Zi := xfi{x)dyn{x) and w; := ^„ xgi{x)dyn{x) for every / e |1, . . . , /:}. Then: 

...,/,) + (!- im^i, . . . ,^^) - W/i + (1 - X)gu. ..,Afk + {\- X)gk) 
k k 

'•=1 ;=i 

/t k 

= A{\ - A)Y^Y^{vi,Vj){zi - Wi,Zj - Wj) 
i=i j=i 

2 



= ^(1-^) 



^ Vi ® (zi - Wi) 



!=1 



> 0. 

2 



Since Aidyn) is a weakly compact subset of Hi^ and ^ is weakly continuous and convex, ^ attains its 
maximum (which equals C{n, B)) on Aic{y„) at an extreme point of Aic{y„), say at (/*, . . . ,/p e Aidyn)- It 
follows that there exist measurable sets A i , . . . , A<. c R" which form a partition of R" such that (/*,..., ) = 

, . . . , Iaj) almost everywher^ as required. □ 

Remark 2.1. In Q a stronger result was proved when B = 4 (the kxk identity matrix). Namely, using the 
notation of the proof of Lemma [ZT] it was shown that the maximum of *P on the larger convex set 

A^):-|(/i,...,A)£//;t: Vj € {1, . . . , ^) > A TjJj - 

is also attained at (f^, . . . ,/^*) = (Iap • • • > ^aJ for some measurable sets Ai, . . c R" which form a 
partition of R". It turns out that this stronger fact helps to slightly simplify the proof of the corresponding 
UGC hardness result. However, we do not know how to prove this stronger statement for general B, so 
we formulated the weaker statement in Lemma [TTl at the cost of needing to modify our proof of the UGC 
hardness result for general B in Section [5] 

The same extreme point argument as in the proof of Lemma 12.11 shows that the maximum of ^ on 
Akijn) is attained at (/*, . . . ,/p = (1^,, - . - , Ia^) for some disjoint measurable sets Ai, . . . , A;t £ but 
now it does not follow that they necessarily cover all of R". When B = 4 it can be shown as in Q that 
these sets do cover R". The same statement is true when B is diagonal, as we now show by arguing as in 
the proof in fV ], but we do not know if it is true for general B. So, assume that B is diagonal with positive 
diagonal entries (h\,. . ., b^). Let A = R" \ IjJ^j A^. Denote zj '■= xdy^ix) and w = xdy^ix). Note that 



"•To see this standard fact observe that otherwise there would be some A c R" of positive measure, £ 6 (0, 1 /2), and distinct 
i,j e [l,. . . ,k] such that /il/i, /jIa £ (e,1 - e). But (f^,. . . , f^) would then not be an extreme point since it is the average of 
{gu...,gk)Ah\,.-.,hk) 6 At(7„) \ {(/;,...,/;)), where g( = hi = /; for C e [\,...,k} \ and g, = (,/;* + e)\a + f'hi'\A, 
hi = if' - e)lA + f!h.'-\A, gj = (/; - «)1a + hj = (/* + e)\a + fJh."\A- 
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w + zi +■■■+ Zk - 0. If w = then attains its maximum on the partition {A U Ai, A2, . . . , Ajt), so assume 
for the sake of contradiction that w 0. For every / € { 1, . . . , ^) we have: 



J] bjWzjWl = , . . . , 1a,) > ^(1a, , . . . , , 



7=1 



!<;<*: j=l 

Thus 2(z,-, w) + llwllj < 0, and if we sum this inequality over / € {\,. . .,k] while recalling that w = - Y^^^i 
we see that {k - 2)\\w\\2 < 0, which is a contradiction. Note that for general B the same argument shows 
that for all / € {\, . . .,k} we have 2 T,'j=i ^ij (^zj, + ^,7Hw||2 < 0. These inequalities do not seem to lend 
themselves to the same type of easy contradiction as in the case of diagonal matrices. < 

The proof of the following lemma is an obvious midification of the proof of Lemma 3.2 in Q. 

Lemma 2.2. Ifn>k-\ then C{n, B) - C{k - 1, B). 

Proof. The inequality C{n,B) > C{k - is easy since for every (fi, . . .,fk) e Akiyt-i) we can define 
[fi,...fk)e Akiyn) by fj{x,y) - fj{x) (thinking here of R" as R''-^ x ]R""*^+1). Then for all ; € {1, . . . , /c) we 

have xfj{x)dyk^]_{x) = xfj{x)dyn{x), implying that I* (/i, . . . ^ »P(/i, . . .,fk)- 

In the reverse direction, by Lemma 12.11 there is a measurable partition Ai, . . .,Aic of W such that if 
we define zj := xdynix) € R" then we have i^Zi,Zj) = C{n,B). Note that Y.]=\Zj = 0. 

Hence the dimension of the subspace V := span{zi, . . . ,Zk} d < k - I. Define gi, . . . : V — > [0, 1] by 
gj{x) = yy-L [{Aj - x)n V-^). Then {gi,...,gk)e Akiyv), so that 

C{k-\,B) > C{d,B) 

k k 

> 

i=i j=i 
k k 

Zl 

k k 

Zl 

k k 

Z2 

'•=1 ;=i 

k k 



^ ^ ^'j \Jy xgi(x)dyvix), xg j(x)dyvix)j 

^^^iji^J J ^Aiix + y)xdyv{x)dyv^(y), J J Iaj{x + y)xdyv{x)dyv^(y)j 

YjYj'^'ji I Projy(w)<iy„(w), I Projy(w)(5fy„(w) ) 
,=1 j=i w^. I 

k k 

2]2]Z7,y(Proj^(z,),Projy(zy)> 
'•=1 j=i 

k k 

T,T,bij{^i'^j)-C{n,B), 



as required. 



In light of Lemma l2!2l we define C(B) := C{k - l,B). We shall now prove an analogue of Lemma 3.3 
in Q which gives structural information on the partition {Ai, . . . , Ajt) of R*^"' at which C{B) is attained. We 
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first recall some notation and terminology from Q . Given distinct zi, ■ ■ ■ ,Zk ^'^'^ ' and j e { 1 , . . . , ^} define 
asetP/zi,...,Z;t) cR'^by 

Pj{z\ ,Zk) ■- \x eM.'' : (x, zd - max (x, Zi) 

y ;e|l,...,*:) 

Thus \Pj{z\, ■ ■ ■ ,Zk)\ ._j is a partition of R*^"' which we call the simplicial partition induced by zi, . . . 
(strictly speaking the elements of this partition are not disjoint, but they intersect at sets of measure 0). 

Lemma 2.3. Let A\,...,Ak Q R.^~ ^ be a partition into measurable sets such that if we set Zj '■- xdjk- 1 (x) 
then 

k k 

c(^^-T^T^b'j{zi'Zj)- (9) 
'•=1 j=i 

Assume also that this partition is minimal in the sense that the number of elements of positive measure in 
this partition is minimum among all the possible partitions satisfying Define 

7:-{7€{l,...,^): yk^v{Aj)>0] 

and set \ J\ - t. Then up to an orthogonal transformation {Zylyej £ K.'^"' and the vectors {zylje/ are non-zero 
and distinct. Moreover, if we define {w c R^"' by 

w j -.-^^^b j.Zs, (10) 

then the vectors {Wyjygj are distinct and for each j e J we have 

Aj = Pji{wi)iej)xR'-' (11) 

up to sets of measure zero. 

Proof. Since Yjjej - 1 almost everywhere we have Yjjej Zj - 0. Thus the dimension of the span of {Zj]jej 
is at most |7| - 1 - { - 1, and by applying an orthogonal transformation we may assume that {Zj}jej £ R^~^ 
Also, for every distinct /, j e 7 replace A,- by A,- U Aj and Ay by the empty set and obtain a partition of R*^~^ 
which contains exactly £ - \ elements of positive measure and for which we have (by the minimality of €): 

C{B) > b„{zs,Zt} + 2 ^ bi,(^Zs,Zi+Zj) + bu\\zi + Zj\\l 

s,teJ\li,j] seJ\{i,j] 

ii2 



s.tej seJ 

= C{B) + 2 (wi - Wj, zj) + \\vi - vjWl ■ WzjWj, 
where we used the fact that bst = (vs, v,). Thus 

2(wi-Wj,zi) + \\vi-Vj\^^-\\Zj\\l<Q, (12) 
and by symmetry we also have the inequality: 

2{wj-wuz) + \\vi-vj\\\-\\zi\\l<0. (13) 
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It follows in particular from (fT2l) and (fT3l) that Zi and zj are non-zero and that w,- wy. Moreover if we 
sum (O and we get that 

2 {wi - Wj,zj - Zi) + ||v,- - vjWl (WziWl + WzjWl) < 



which implies that Zi + Zj- 

The above reasoning implies in particular that [Pji{wi)iej) x R*^~'^| .^^ is a partition of R*^"' (up to pair- 
wise intersections at sets of measure 0). Assume for the sake of contradiction that these exist / € J such 
that 

n-i (a,- \ (p,((w,),e/) X r'-')) > 0. 

Arguing as in the proof of Lemma 3.3 in [71 we see that there exists e > and j e J \ {/) such that if we 
denote E := e Aj : {x,Zj} > {x,Zi) + e} then yk-\{E) > 0. 



Define a partition Ai, . . . A*, of R*' ^ by 



Ar {/, i\ 

Ar := -J A,- \ £■ i 
Aj^E r = i. 



Then for w := jg xdyk-\{x) we have 



C{B) > 



^ bstl L xdyk-i{x), I xdyk-i{x) 



^ bst {Zs, Zt) + 1 ^ his (Zs, Zi-w) + 2 ^ j, (^Zs, Zj + w) 

s,tej\[i,j] sej\[i,j] seJ\{i,j] 

+2bij {zi - w, Zj + w) + biiWzi - w\\l + bjjWzj + w\\l 

C{B) - 2 2] hi, {z„ w> + 2 2] bj, <z,, w) + (bii + bjj - Ibij) ||w||2 



C(B) + 2(wy-Wi,w) + ||v,--Vy|i.|M| 

> C(B) + 2 J ({zj,x)-{zi,x))dyk-i{x) 

> C{B) + 2syk-i(E)>C{B), 



a contradiction. 



Remark 2.2. Note that we have the following non-trivial identity as a corollary of Lemma 12. 3 1 (and using 
the same notation): For each / e J, 

xdyc-iix), (14) 



where we recall that the w, are defined in (ITOl ). This system of equalities seems to contain non-trivial 
information on the structure of the partition at which C{B) is attained. In future research it would be of 
interest to exploit this information, though we have no need for it for our present purposes. < 

Remark 2.3. Given B and e > we can estimate C{B) up to an error of at most e in constant time (which 
depends only on B, k, e). Moreover, we can compute in constant time a conical simplicial partition of R*^"^ 
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at which the value of *F is at least C(B) - s. These statements are a simple corollary of Lemma [23] Indeed, 
all we have to do is to run over all choices of £ e {1, . . . , ^) and for each such { construct an appropriate 



net ofzi,...,Ze e 



of bounded size, and then check each of the induced simplicial partitions of 



as 



in (fTTI) for the one which maximizes *P. To this end we need some a priori bound on the length of z,-: the 
crude bound 



Ik/lb 



/ 

JAi 



f Wxhdye.yix) < ^ 



will suffice. Fix 5 > which will be determined momentarily. Let N he a 6-net in the EucUdean ball of 

radius V^in R^-\ Then \N\ < (^] ■ 

Let A], . . . , A/t be as in Lemma 1231 i.e., the true (minimal) partition at which C{B) is attained. Let /, i, 
Zi and Wi be as in Lemma [231 For each / € / find z'j ^ N for which \\zi - z'jWi ^ <5- Define w'. = J^sej ^jsZ's- 
Then we have the crude bound ||w, - wjib < (5 11^=1 lif=i \bst\ '■- We also have the a priori bounds 

WwiWi, \\w'i\\2 < V^l|fi||i- By compactness there exists 6 = 6{s, C, B) such that these estimates imply that for 
all j € J, 



xdyt-]_{x) 



xdji-iix) 



r 



xdye-i{x) 



2yR\\B\\i 



(15) 



(It is actually easy to give a concrete bound on the required S if so desired, but this is not important for our 
purposes.) It follows from dTSl) that: 



C(B) 



xdje-iix), \ 

JP,{(w',),ej) 



xdye^i{x)j 



s,tej 



2^/£\\B\h 



• 2 - C{B) - s. 



Note that the above integrals can be estimated efficiently (polynomial time in k) with arbitrarily good pre- 
cision due to the fact that the simplicial cones Pj{(w'.)iej) have an efficient membership oracle and the 
Gaussian measure is log-concave. These are very crude bounds that suffice for our algorithmic purposes 
when k is fixed, but deteriorate exponentially with k. It would be of interest to understand whether we can 
estimate C{B) (and more importantly the associated partitions, as they are used in our rounding procedure) 
in time which is polynomial in k. Perhaps the identities (fT4]) can play a role in the design of such an efficient 
algorithm, but we did not investigate this issue. < 

We end this section with a simple analytic interpretation of the parameter C{B). Given a square inte- 
grable function / : R" ^ its Rademacher projection Rad(/) : IR" ^ 1.*^ (see [,10.1 for an explanation of 
this terminology) is defined for x = (xi, . . . ,Xn) ^ R" as: 



Rad(/)(x) = V ( r yif(y)dyn{y) I x,-. 

Assume that / takes values in {vi, . . . , v^} c R'^ and define A,- = f^^Vi) for / e {1, . . . , k}. Then |Ai, . . . , A^} 
is a measurable partition of R". We also have the identity: 

( k 



Rad(/)(x) = J] J]vy r yidyniy) 



Xi. 
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Thus 



l|Rad(/)|| 



n k k 



= r ||Rad(/)(x)||2jy„(x)= y Vv,- r yidyn{_ 

n K k I p \ I r \ ^ ^ I r r \ 

I y^'^ynky) \\ yidynm\ = ^Y^Yj'i\\ y^ynkf),] ydy„(y)). (16) 

i=l j=l t=\ \^Aj ) \JA[ I j_y \JAj JAj I 

The identity (fT6l ) implies the following lemma: 
Lemma 2.4. For every n > k — \ we have: 

C{B) = max ||Rad(/)||2 . 

/:R"-^{vi,...,VA-) i^2(7nM ) 

Recall that R{B) is defined as the radius of the smallest ball in Mf^ which contains the set {vi, . . . , vt) and 
that w{B) is the center of this ball. Lemma flAl implies the following corollary: 

Corollary 2.5. C{B) < R{Bf. 

Proof. Let {Ai, . . . ,Ak} be a partition of R*^^ into measurable sets such that if we define zj - j^, xdyk-\{x) 
then 

k k 



'•=1 7=1 

k k 



;=i y=i i=i \ ;=i 



7=1 



(17) 



Since Zj=i Zy = it follows from ^ and ([HI) that for / 
we have: 



%k-\ 



{vi - wmU defined by /U, = v;-w(B) 



C(5) = ||Rad(/)||^^(^^^^^,^ < WfWl,^^^^^., < ll/llL(r,„R*) ^ H^'' " ^(^)ll2 = ^(^)'' 

where in (★) we used the fact that Rad is an orthogonal projection on the Hilbert space L2(y„, R^). 



3 Generalized positive semidefinite Grothendieck inequalities 

The purpose of this section is to prove the following theorem, which as explained in the introduction, is an 
extension of Grothendieck' s inequality for positive semidefinite matrices. 

Theorem 3.1. Let A - (aij) € M„(R) be annxn symmetric positive semidefinite matrix. Let vi, . . . , e R*^ 
be k >2 vectors and let B = {bij - (v,-, vy)) be the corresponding Gram matrix. Then 



n n n n 

max > > ai;(x;,xi)< max > > a,;(Vo-m, Vo-r a). 

...x.eS"-^^^ ^ ^ C{B)<T:{l,...,n]-^{i,...,k]'^^ ^ ^" 



xi,...,x„eS 



i=\ j=l 



(18) 



i=i j=i 
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We shall prove in Section |3TT] that the factor in (fTSl ) cannot be improved, even when in (ITSl) A is 
restricted to be centered, i.e., ^^"=1 Zy=i '^ij - 0- 

The key tool in the proof of Theorem l3.1l is the following lemma: 

Lemma 3.2. Let : / € {1, . . . , m}, j € [I, ... ,n}^ be i.i.d. standard Gaussian random variables and let 
G = (gij) be the corresponding mx n random Gaussian matrix. Fix two unit vectors x,y € 5""' and two 
measurable subsets E,F Q R*". Then: 

Pr [GxeE A Gy e F] 

= ym{E)7m{F) + {x,y)lfudy,„(u),fudy,„{u)\ + Y,{^^^'y^^) Z '^.'(^)«^(^)' (^9) 



(NUIOI)"" 



for some real coefficients {a.v(£')Le(Nu|0)r , {o'i(f')).ve(Nu{0)r £ 



Proof. Denote r = {x,y}. Let g,h e R. be independent standard Gaussian random variables and let 
gi, . . . ,gm e R." be i.i.d. standard Gaussian random vectors in R" (i.e., they are independent and distributed 
according to y„). Then for each / e {1, . . . , m} the planar random vector ({gi, x), {gi,y}) € R^ has the same 
distribution as (^g, rg + Vl - r^h^ € R^, and hence its density is given for (m,v) € r2 by: 



friu, v) 



27rVT 



1 I u^ - 2ruv + 

z • exp - 



2(1 - r2) 



The Hermite polynomials are defined as: 



//^(f) - (-l)V — (e~' ) = > — — (20*^"^'. 



The formula for the Poison kernel for Hermite polynomials (see for example equation 6.1.13 in U or the 
discussion in lfT4l ) says that 



V2/ \ V2 



Since the vector (Gx,Gy) € R^"* has the same distribution as the vector (((gj, x>, (g,-, j, whose (planar) 
entries are i.i.d. with density fr, we see that: 



Pr [G^ € £ A G3; € f ] - n fAuu V,) 

-L 



dudv 



(||«||2+||v||2)/2 



g \im\2 ' 11-112 



dudv 



'^^^'^ Vi'6(NU{0))'" 



^.vi +■■■+. v,„ 



dym{u)dym(y) 



= ym{E)y,„{F) + {x,y)i\ udy,n{u), \ udy,n{u)\ + ^{x^^ ,y'^'^) ^ a,.(£)Q',(f ), 

' e=2 .vefNUIOl)'" 



.ve(NU{0))'" 
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where we used the fact that Ho{t) - 1 and Hi{t) = It, and for every measurable subset W c and 
5 € (N U {0})™ the notation 



1 r ( ( ■ \^ 



dy,„{u). 



The proof of the identity (fT4l) is complete. □ 

Proof of Theorem UH] Fix n unit vectors x\,... ,x„ € 5""^. Let {Ai, . . . ,A,t) be a partition of R'^"' into 
measurable subsets. Let G be a random Gaussian matrix as in Lemma 1X2] with m = k- \. Define a random 
assignment cr : {1, . . . , «} {1, . . . , ^} by setting crii) to be the unique p e {1, . . . , fc} for which Gx, € Ap. 
Then for every /, j € 1 1 , . . . , «) we have 



k k 



k k 



E [(v^(,), v^(;))] ^Y^Y^ {vp, Vq) Pr [Cxi € Ap A G;c^- e A^] = ^ J] ^p<? Pr [Ga;; e Ap A Gxy € A^] . 



We may therefore apply Lemma [l!2] to deduce that: 



p=\ q=l 



j=Y j=l 



\ k k 



Tj Tj ""'j Tj Tj ^pi'^^- 1 ^^P^y^- 1 



n n \ ^ ^ I r r \ 



CO I n n 



k k 



^=2 V'=i ;=i 



i6(NUlO|r p=\ q=l 



n n \ k k j p p 

J=l j=l ) p=l q=\ Wa,, Ja, 



where we used the fact that both A and B are positive semidefinite. It thus follows that there exists an 
assignment cr : { 1 , . . . , «) ^ { 1 , . . . , ^) for which 

n n ( n n \ k k i ^ „ 

Yj Tj '''j (^^(') ' ^^(^'^ )- TjTj '''j ' ""j) Zi Zi ^p'i ( I 1 I """^^^ 



and since this is true for all measurable partitions |Ai, . . . , A^,) of K.*^ ^ we deduce that there exists an assign- 
ment cr : [I, . . .,n} {1, ... for which: 

n n n n 

Z Z ""'i - ^^^^ Z Z (^'' ^j) ' 

i=l j=l i=l j=l 



as required. 



□ 
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3.1 Optimality 

The purpose of this section is to show that Theorem [37T] is sharp: 

Theorem 3.3. Let v\,...,Vk £ be k > 2 vectors and let B = (bij = (vj, vj}) be the corresponding Gram 
matrix. Assume that K > is a constant such that for every « € N and every centered symmetric positive 
semidefinite matrix A - (ajj) e M„(]R) we have: 



(20) 



i=i j=i 



i=l ;=1 



Then K>-^y 



Proof. The proof consists of a discretization of a continuous example. The discretization step is somewhat 
tedious, but straightforward. We will start with a presentation of the continuous example. Fix m e N and let 
g,h eW" be independent standard gaussian random vectors. Since (H^lb, WMh) is independent of (jjjg-, j]^) 
we have: 



y 



= E[||g||2-|Nl2]-E 



■^112 II3'II2 

g h 



dym(x)dym(y) = 

21 







Ilgll2 • 1 





Wgh' 



E[||g||2]'E 



ll^ll? 



m ^ 

1=1 



Wgf, 



-EiWghf, (21) 
m 



where we used the rotation invariance of the distribution of h. 



The distribution of ||g||^ is the distribution with m degrees of freedom, and therefore its density at 



M > equals -„,n' ,^. u"i ^e It follows that 

-1 2"''-r(m/2) 



1 r°° 



• u^-^e~"'^du - V2- 



> ^/m\l- 0\ — 



m 



(22) 



where the last step is an application of Stirling's formula. Plugging (l22l ) into (|2TI ) we see that: 



Jr^xR"' \ll-''^ll2 



y 



dym{x)dy,n(y) > 1 - O — 
m 



X\\2 \\y\\2 

Now, assuming that m > ^ - 1, for every / : — > {vi , . . . , v^;) we have 

2 



(23) 



{x,y) ■ {f{x),f(y)) dym(x)dym(y) 



1/ 



X® f{x)dy,n{x) 



f{x)dyn,ix) 



= l|Rad(/)|| 



i2(y„„R*) 



< C(B), (24) 



where we used Lemma 124] (and here ^i, . . . , is the standard basis or R™). 

We shall now perform a simple discretization argument to conclude the proof of Theorem 13. 3 1 Fix £ > 
and M € N. Let ^ be the set of all axis parallel cubes in [sM, eM]'" which are a product of m intervals 
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whose endpoints are consecutive integer multiples of e in [-M,M]. Thus \^\ = (2M)'" and each Q € ^ 
has volume e"'. For 2 € ^ let zq be the center of Q. For every P,QeJ^ define 

apQ 2 {zp,zq). 

By our assumption (l20l ) there is an assignment cr : ^ ^ { 1, . . . , fc} such that 



\lkp|l2 Ikell2/ 



(25) 



We shall now use the following straightforward (and crude) estimates: 



J 



y 



) dy,„{x)dyrn(y) - ^ ^pg 



■^112 imi2 

^' - {zp,zq} 



+ 



P,Qe.^ 
Zp Zq 



ZP Zq 



Ikp|l2'lk(2ll2 

X y 



Ikplb'lkelb 

_IWI5+II.th2 

- e {x,y) 



\X\\2 



dxdy 



(x,y) 



(R'" xR"" )\ ([-eM.eM]'" X [-eM,eM]"' ) 



Wl2 



cfym(^)cf7mCv) 



0{\)ymE[yJmM£) / \ ^ ^ dxdy + 0{\)me 

p,^_^^^xe 



< 0{\)^/mE[^/mM£) + 0(l)mV 
We shall require in what follows that sM = 2m. Hence, using (l23l) we deduce that: 



2 '^PQ 

P,Qe.^ 



Zp Zq 



Ikplb'Ikelb 

{vi,...,VA:)by 

m-- 



> 1 - 6>|m^£+ - . 

m 



On the other hand, define / : W 
Observe that by symmetry 

/ 

J (R"" xR" )\([-£M,£M]'" X [-eM,eM]" ) 

and therefore a similar crude estimate yields: 



v^(Q) xeQe^, 
vi X ^ [-eM, eM]" 



{x,y} ■ {f{x), f(y)) dy,n{x)dy,n{y) = 0, 



I (■^, 3') • f(y)) dym{x)dym(y) - V apg {vo-(p), Vo-(q)) 



_ w|+IMl| 

e 5 <x,j> - e 



P,Qe^ 

{zp,zq} 



\(vo-(P),Vo-{Q)}\dxdy 



(26) 



< Ofm^e) max ||v,||?. 

^ ' ie{l,...,k] 



(27) 
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Choosing e = m ^ (and thus M - IrrP), and combining (ITT] ) with (l24l ) and (l26l ). yields in combination 
with (I25]l the bound: 

1 - o|^J < /:|c(B) + 0(3] max IkH^]. 
Letting m — > 00 concludes the proof of Theorem [331 



m/ ie|l, ...,*:) 



4 A sharp approximation algorithm for kernel clustering 

Let A = (aij) e M,j(R.) be a centered symmetric positive semidefinite matrix and let B = (bij) e Myt(]R) 
be a symmetric positive semidefinite matrix. Our goal is to design a polynomial time algorithm which 
approximates the value: 

n n 

Clust(A|B)= max V J] '''^■^'^('>(^> 
o-: i,...,n U,-,*; ^ ^ 

1=1 ;=1 

We proceed as follows. We first find vectors v\,. . .,Vk £ such that bij = (v,-, vj) for all /, 7 € {1, ... , k}. 
This can be done in polynomial time (Cholesky decomposition). Let R{B) be the minimum radius of the 
Euclidean ball in that contains {vi, . . . , vt) and let w{B) be the center of this ball. Both R(B) and w{B) 
can be efficiently computed by solving an appropriate semidefinite program. 
We now use semidefinite programming to compute the value: 

inn j 
Y^Yjaij{xi,Xj) : xi,...,x„e]R" A Hx.lb < 1 V/ e {1, . . . ,«} i 

(n n I 
^ ^ aij {xi, Xj): xi, . . . , x„ € S"~^ \ , 
i=i j=i J 



(28) 



where the last equality in (l28l) holds since the function (xi, . . . , x„) 1-^ 111=1 ^|/=i ^'7 {■"■'' convex (by 
virtue of the fact that A is positive semidefinite). We claim that 

ClusMIB)^^ Clust(^IB) 

/?(B)2 ^ C(B) 

which implies that if we output the number R(B)^Clust{A\B) we will obtain a polynomial time algorithm 
which approximates Clust(A|B) up to a factor of -^gy- 

To verify dig let x\,..., x^ e 5""^ and cr* : {1, . . . , «} ^ {1, . . . , ^} be such that 

n n 

SDP(A|B)-2]2]fl;y(x*,x}), 
'•=1 ;=i 

and 

n n 

Clust(A|B) = 2] 2] ^iP'^'i^^'dy 
i=\ j=\ 

Write {aij)1j^-^ = ((m,-, Uj))" for some u\,. . . ,u„ € R". The assumption that A is centered means that 
Z'Li - 0- The right-hand side of inequality in ( |29l ) is simply a restatement of Theorem l3.1[ The left-hand 
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side inequality ( [291) follows from the fact that ^""^^^^^^ has norm at most 1 for all / € {1, . . . , n). Indeed, 
these norm bounds imply that: 



hcr'(i)-w{B) Va-'{j)-w{B)\ 

(=1 7=1 i=i i=i '=1 i=i 



Clust(Alfi) 

This completes the proof that our algorithm approximates efficiently the number Clust(A|B), but does 
not address the issue of how to efficiently compute an assignment cr : {I, . . . ,n] — > |1, . . . for which 
the induced clustering of A has the required value. An inspection of the proof of Theorem 13.11 shows that 
the issue here is to find efficiently a conical simplicial partition Ai, . . . , A,t of m!'~^ at which C{B) is almost 
attained, say 

^^^Pi\\ ^dyk-\{x), \ xdyk-\{x)\>{l- e)C{B). 

p=l q=\ J A, I 

Once this partition is computed, using the notation in the proof of Theorem 13.11 we have a randomized 
algorithm which outputs an assignment cr : {I, . . . ,n} ^ {I, . . . ,k] such that 



2 "O'^o-('V(7) 
i=l 7=1 



. (izi)«£)c,„..(A|B,. 

R{B)^ 



Note that there is no difficulty to compute cr efficiently once the partition {Ai, . . . , A<.) is given, since these 
sets are simplicial cones. The issue with efficiency here is how to compute this partition in polynomial time. 
As we discussed in Remark 1231 this can be done when k is fixed (or grows very slowly with n), but we do 
not know how to do this when, say, k = ^Jn. 



5 Matching Unique Games hardness 

In this section we show that for a fixed positive semi-definite matrix B, approximating CIust(A|B) within 
a ratio strictly smaller than is Unique Games hard. We will study functions / : { 1 , . . . , ^ R and 
their Fourier spectrum at the first level. A novel feature of our proof is that our Fourier analysis will be 
carried out with respect to a distribution on 1 1, . . . , A;) that is not necessarily uniform. In fact, the choice of 
the distribution itself is dictated by the matrix B as described in Section |5H 

5.1 Choosing a special probability distribution on { 1 , . . . , fc} 

Fact 5.1. Let B - (bij) be a kxk symmetric positive semi-definite matrix and bjj - (v,, vj) be its Gram repre- 
sentation, where v\,...,Vk are vectors (w.l.o.g.) in m!^. Let R{B) be the minimum radius of a Euclidean ball 
containing all these vectors, and w{B) be the center of this ball. Then w{B) is a convex combination of the 
Vi's that are on the boundary of the ball. In other words, there exist non-negative coefficients p{V), . . . , p{k) 
such that p{i) = 1, w{B) = pii)Vi and p{i) only if\\vi - w{B)\\2 = R{B). 
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Fact 15.11 is well known (see for example the proof of Proposition 1.13 in |T|). Its proof is a simple 
separation argument. Indeed, define J := {j e {I, . . . ,k] : \\vj - ^(6)112 = RiB)} and let K be the convex 
hull of {vj]j(zj. Assume for the sake of contradiction that w{B) ^ K. Then there would be a hyperplane H 
separating w(B) from K. Moving w{B) a little in the direction of H would turn the equalities on J to strict 
inequalities, while preserving the strict inequalities off J. This contradicts the minimality of R{B). 

We intend to use the probability distribution {p{l), . . . , p(k)) from fact 15.11 However, for technical 
reasons, we need the probability mass for each atom to be non-zero, and therefore, we will use a very small 
perturbation of this distribution. Towards this end we define = (1 - P)pii) + j for every i e {I,. . .,k}. 
The value of /? > is chosen to be sufficiently small as in the following lemma. 



Lemma 5,2. Fix any s > and the matrix B. Then for a sufficiently small fi = I5{e, B) > 0, 

2 



i=\ 



Vi 



7=1 



> R{BY - s. 



(30) 



Proof. Note that if ^ - 0, then yu(0 = p{i) for all / € {1, . . . , fc}, and 

2 



(=1 



Vi 



w(5)||2 = R{Bf, 



(=1 



since p{i) + only if ||v; - w(S)||2 = R{B). Thus by continuity for sufficiently small /3 the inequality ( 1301 ) 
holds. For concreteness we also give a direct argument which gives a reasonable bound on fi. Assume that 
^ <\. Then, using the fact that p>(\ - P)p (point-wise), we see that: 



(=1 



( k 



2x1/2 



> 



2/ 



i=l 
xl/2 



(1-/3) 



7=1 



R ^ 
7=1 



V7) 



2x1/2 



2J 



\i=\ 



7=1 



V7) 



2x1/2 



2/ 



> {i-pf^R{B)-p^fr^ 



k ^ k 

Yp^hYh-''i\ 



All 



7=1 



> (.\-l5fl^R{B)-li^ir^ max Ih - v.lb 

(,;e{l, ...,/:) 

> vr^( i-3/3)/?(g) 

> ^Jl-^f3■R{B), 



where in the penultimate inequality we used the trivial fact that max,jgii_ ||v; - Vy||2 < 2R{B). Thus we 
can take /3 



7R(B)- 



to ensure the validity of ( 130 



Henceforth we fix the probability space (O = |1, . . . ,k},fi). Let U = (ujj) he a. k x k orthogonal ma- 
trix such that uij = for all j € {1, . . . (such an orthogonal matrix exists since this ensures that 



lLj=i "ij - 1)- Now define random variables Xi, . 



,Xk : {l,...,/cl 



by Xi{i) = JI— (here is one place 



where we need the atoms of fx to have positive mass. We will also use this fact to allow for the application 
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of the result of [91 in the proof of Theorem l5.4l below). Then by design Xi is the constant 1 function, and 
for all /, j € {1, . . . , ^) we have: 

k k 

Y^fiiOXimjiO = muji = {uu% - dij, 

(=\ e=i 
where dtj is the Kronecker delta. Similarly: 

k ^ k 

UtiUij 



c=\ 



{U'U)ij _ 6ij 



By relabeling these random variables (for the sake for simplicity of later notation) we thus obtain the fol- 
lowing lemma: 

Lemma 5.3. There exist random variables X{),X\, . . . , Xj^-i on Q. such that: 
. Xo = 1. 

• For i,j€ {0, . . . , ^ - 1} we have 

E^lXiXj] = 



if / ?t j, 

1 if/ = j. 



For every a),a)' e Q we have 

k-i 



if a» oj', 
if a> = o)'. 



i=o ^ /^('^^ 

5.2 Dictatorships vs. functions with small influences 

In this section we will associate to every function from { 1, . . . , k}" to 



:= jjc € R*^ : x,- > V / e {1, . . . J] Xi = l| 



a numerical parameter, or "objective value". We will show that the value of this parameter for functions 
which depend only on a single coordinate (i.e. dictatorships) differs markedly from its value on functions 
which do not depend significantly on any particular coordinate (i.e. functions with small influences). This 
step is an analog of the "dictatorship test" which is prevalent in PCP based hardness proofs. 

We begin with some notation and preliminaries on Fourier-type expansions. For any function / : R" ^ 
Ak we write f ^{fufi,---, fk) where fi : R" ^ [0, 1] and y!1=x fi = 1- With this notation we have 

C(B)= sup y,ybij{\ xfi{x)dyk-\{x), \ xfj{x)dyu~i{x) 

where C{B) is as in Section[2l We have already seen that the supremum above is actually attained. Also C{B) 
remains the same if the supremum is taken over functions over R" with n>k-\, i.e. for every n>k - I, 

C{B)= sup Y,Ybij{\ xfi{x)dyn{x), I xfj{x)dyn{x)\ . 

m-^A^ttpi \Jr« Jr" / 
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Let (Q = {I, . . . ,k},iu) be the probability space as chosen in Section [5?T] Let (Q.",fi") be the associated 
product space. We will be analyzing functions f : Q." (and more generally into m!^). As in Lemma 

15.31 fix a basis of orthonormal random variables on D. where one of them is the constant 1 function, that is 
{Xq = l,X\,. . . ,Xic-i}. Then any function / : Q — > R can be written as a linear combination of the X,'s. 

In order to analyze functions / : ^ R, we let ?( = {X\,?(2, ■ ■ ■ be an "ensemble" of random 
variables where for i e {1, . . . ,«) we write Xi = {X,-o,^(,i, ■ ■ ■ ,X;_/t-i), and for every /, {Xij}'jZQ are indepen- 
dent copies of the |Xy)J~Q. Any cr - (o"i, o"2, . . . , cr„) € {0, 1, 2, . . . , /c - 1)" will be called a multi-index. We 
shall denote by |cr| the number on non-zero entries in cr. Each multi-index defines a monomial 



n 



ie{ !,...,«) 

on a set of n{k - 1) indeterminates {xij \ i e {I,. . . ,n},j e {1,2,...,^- 1)), and also a random variable 
X^ : ^ R as 

n 

XAco) ■- ]~[x,>,(^0. 

1=1 

The random variables {Xa-}a- form an orthonormal basis for the space of functions f : Q." ^ R. Thus, every 
such / can be written uniquely as (the "Fourier expansion") 



/ = 2]/(c^)X<r, f{(r)e 



We denote the corresponding multi-linear polynomial as Qf = Ho- /(cr)xo-. One can think of / as the polyno- 
mial Qf applied to the ensemble X, i.e. / = Qf{X). Of course, one can also apply Qf to any other ensemble, 
and specifically to the Gaussian ensemble Q = (01,02, ■ ■ ■ ,&n) where Qj - {G,_o = l,G,;i, . . . ,Gi^k-\] and 
G;j, / € 1 1, ...,«), J € {1, A; - 1) are i.i.d. standard Gaussians. Define the influence of the /'th variable 
on / as 

InfK/) := Ti^^^- 

Roughly speaking, the results of |[T2l |3 say that if / : Q" — > [0, 1] is a function all of whose influences 
are small, then / = Qf{X) and Qf{Q) are almost identically distributed, and in particular, the values of 
Qf{Q) are essentially contained in [0, 1]. Note that QfiQ) is a random variable on the probability space 

(R«(^-i),r„(,_i)). 

Consider functions f : Q." ^ We write / = (fufj, ...,fk) where ^- : fl" ^ [0, 1] with fi = 1- 
Each fi has a unique representation (along with the corresponding multi-linear polynomial) 



fi = Y, fi(^^^<r' Qi ■■= Qf. - Z ^'^^^^ 



We shall define an objective function OBJ(/) that is a positive semidefinite quadratic form on the table 
of values of / which corresponds to a centered symmetric positive semidefinite bilinear form. Then we 
analyze the value of this objective function when / is a "dictatorship" versus when / has all low influences. 

The objective value 

For a function / : O" ^ Ayt (or more generally, f : Q." ^ R*^) define 



k k 



0BJ(/):=22^-, 2 Mcr)fjicr) 



i=l 7=1 



Vo-: |o-|=l 



(31) 
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Note that there are n{k - 1) multi-indices cr such that |cr| = 1. 



The objective value for dictatorships 

For ^ e |1, ...,«} we define a dictatorship function f^"^''^ : O" — » A<- as follows. The range of the function 
is limited to only k points in Ajt, namely the points {ei, e2> ■ ■ ■ > ^/t) where e,- is a vector with coordinate 1 
and all other coordinates zero. 



(32) 



In other words, when one writes f^"^''^ = {f\,f2, . . . ,/jt), for i e {\, . . . ,k}, fi is {0, l}-valued and fiio)) - 1 
if and only if caJc - i. The Fourier expansion of fj is 



cr: a-j=0 j+t 



Indeed, the right hand side of (1331) equals 



MO ^-^('^ ^ I J otowise. 



0<o-f<*:-l 



(see Lemma [53] ) 



Thus, 



k k ( " 

(=1 j=l Vcr: |(t|=I 

k k (k-l 

i=i j=i Vr=l 

k k (k-l 

" Z Z ■ i^^'^i^^j^ Z ^'■^')^'-^^') - 1 

i=l ;=1 Vr=0 



y (V,-, ■ yu(0;u(;)(-i) + y <v,-, ■ fi(i)^ (4: - 1] 



(=1 



7=1 



> R{Br-E, 



using Lemma [521 



(33) 



(34) 



The objective value for functions with low influences 

For / : — > R, 7 € {1, . . . , n) and m e N denote (the "degree m-influence" of /): 



Inff (/) - Y f^^^' 



|cr|<m 
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For every < p < 1 we will use the smoothing operator: 

(T 

Equivalently, 

Tf,f(oji, ...,(jjn) = E[f(aj[, aj'„)], 

where independently for each /, co'. is chosen to be a>j with probability p and a random (with respect to the 
underlying distribution ju) element in Q. with probability I - p. 

The following theorem is the key analytic fact used in our UGC hardness result: 

Theorem 5.4. For every s > 0, there exists t > so that the following holds: for any function f : Q." ^ 
which satisfies 

V / € 11, . . . V 7 € {1, . . . ,«}, lnff^^'^'\f) < T 

we have, 

OBJ(/) < C{B) + E. 

Proof. Let 5, 77 > be sufficiently small constants to be chosen later. Let Qi - Qf. be the multi-linear 
polynomial associated with f. Recall that Q, is a multi-linear polynomial in the n(k - 1) indeterminates 
^Xjp \je {l,...,n],pe {\,...,k- 1)}. Moreover /]• ^ Qi{?() has range [0,1] and^jL^/;- ^ 1. 

Let/?,- = (Ti-sQiXX) and Sj = {Ti-sQdiQ) (the smoothening operator Ti-^ helps us meet some tech- 
nical pre-conditions before applying the invariance principle of [9]). Note that /?,■ has range [0, 1] and 5,- 
has range R. It will follow however from [9] that S,- is essentially in [0, 1]. First we relate OBJ(/) to the 
functions Si which will, up to truncation, induce a partition of R"'^*^"^\ which in turn will give the bound in 
terms of C{B). 

k k 

i=\ f=l o-:|o-|=l 

k k n k-\ I p \ I r \ 

^ (1 - Sf y V hu y yj \ xjp Qi{x)dy„(k-i)(x)\ ■ I Xjp Qe{x)dy„^k-i){x)\ 
^ (l-5)^yy^n| xQi(x)dyn(k-\)(x), I X Q{(x)dyn(k-\)i 

^ /./bull X {Ti^sQi)(x)dy,i{k-i)(x), I x {Ti-sQd{x)dy„(i 
jrf jr( \ Jr"(*-i) Jr«(*-i) 

X S i{x)dyn(k-i){x), I X S e{x)dy„(k-i){x)) . (35) 

Jr''(*-i) / 



We shall now bound the last term above by C{B) + o(l). For any real-valued function h on R"^* ^\ let 

if h{x) < 0, 
chop(/i)(x) := \ h{x) ifh{x) € [0, 1], 

1 \fh{x)>l. 
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Applying Theorem 3.20 in ||9l to tiie polynomial Qi, it follows that (provided r is sufficiently small compared 
to 6 and ij), 

lis - chop(S i)f , , . = I \S iix) - chop(S dix)?' dyn{k-i){x) < J]. (36) 

The functions chop(5',) are almost what we want except that they might not sum up to 1. So further 
define 

S*(x) — chop(S i){x) 
' ' i:tiChop(SO(x)' 

Clearly, [s*].^ have range [0, 1] and ^^^^ S* = \. Observe that the following holds point-wise: 



^ chop(S j) - 1 



2|chop(S,)-S}| 

7=1 

where we used that 'L'j=i S j - T\-s 1L)=\ Qj - Tis^ = 1. It follows that for all / e {I,. . .,k} we have: 



k k 



< J]|S, -chop(S,)|, 

7=1 



||chop(SO - S*\l^^^^^^_^^^ < ||chop(S,) - SX^^^^^^^_^^^ < 2 \\Si - chop(S,)|L,(,„,,_., < k ^, 



;=i 



;=i 



where we used (1361) . Finally, 



<{k+\)^/Jj. (37) 



Now write 



X S i{x)dyn(k-i){x), Wi 



JlRn(*-l) 



{x)dyn{k-i)ix). 



(38) 



The norm of m,- - w,- is bounded by (^ + 1) -^y using (|37] ) and Lemma [531 below. Since 15*1 < 1, the norm 
of Wi is bounded by 1. Returning to the estimation in Equation (l35l) and applying Lemma [5^ below, we see 
that: 



k k 



k k 



( k k 



(1 - 6f ■ 0Bj(/) = y,Ya > - Z Z "^^^ + v^) Z Z 

Since Zf=i = 1 we have 



i=i f=i 



(=1 f=i 



; /} bit{wuW[) hA\ X S*{x)dy„(k-i){x), I x S*i{x)dyn(k- 

' k k I „ „ 

^i^^Mil X fi{x)dyn{k-\){x), I xfe{x)dyn, 



< sup 



(/(:-l)(-^) 



C(B). 



It follows that OBJ(/) < C{B) + e, provided that 77 and 6 are small enough. 
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Lemma 5.5. Let g e L2(W^,yn)- Then 

\ xg{x)dyn{x) 
Jr" 

Proof. Note that the square of the left hand side equals 



^ II§IIl2(R",7„)- 



n p 



Xi g{x)dyn{x) 



2 n 



Y,{xi,g)\ 



i=\ 



Since Xj e L2(]R",7„) are an orthonormal set of functions, the sum of squares of projections of g onto them 
is at most the squared norm of g. □ 

Lemma 5.6. Suppose and {w/)jLi are vectors in R" such that - w,||2 < dfor every i e [I, . . .,k} 

and \\wi\\2 < 1- Let B - (bij) be a kx k matrix. Then 



k k 



k k 



k k 



j=i e=\ i=\ e=i i=\ e=i 

Proof. From the given conditions on the norms of a, = m; - w,- and w,-, it follows that for any i,£ e {1, . . . , 



\{ui, ue) - (wi, we)\ < \(ai, we)\ + \(af, Wi)\ + \(ai, ae)\ < 2d + d 



Hence, 



k k 



k k 



k k 



k k 



^ ^ biiiui, > - ^ ^ bifiwi, W() ^ ^ ^ \^it\ K"i> ud - (wi, we)\ < {id + 



i=l C=l 

as required. 



i=i e=i 



The intended hardness factor 

As we show next, the dictatorship test can be translated (in a more or less standard way by now) into a 
Unique Games hardness result. The hardness factor (as usual) turns out to be the ratio of the objective value 
when the function is a dictatorship versus when the function has all low influences, i.e. 

RjBf - s R{Bf 

oil). 

C{B) + e C{B) 

5.3 The reduction from unique games to kernel clustering 

Given a Unique Games Instance X(G(V, W, E), n, {7Tvw}{v,w)eE), we construct an instance of the clustering 
problem. 



Reformulation of the clustering problem 

As in our earlier paper [7 ], we first reformulate the kernel clustering problem for the ease of presentation. As 
observed there, we can reformulate it as (the matrix A in the problem Clust(A|B) is captured by the quadratic 
form Q below): 

Kernel Clustering Problem: Given a ^ x ^ symmetric positive semidefinite matrix B, and a symmetric 
positive semidefinite quadratic form Q{-, •) on x R^, find F : { 1, . . . , A/^} A^., F = {F\,F2, ■ ., F^), so 
as to maximize Yj'}=i Zli bijQ{Fi,Fj). 
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The clustering problem instance 



Given a Unique Games instance ^ (G( V, W, E), n, {7rv».)(v,w)e£), the clustering problem is to find a function 
F : WxQ." Ak so as to maximize Tj)=i bijQiFj, Ft) where 2 is a suitably defined symmetric positive 
semidefinite quadratic form. For notational convenience, we write: 



F„:=F{w,-), F„:a"^Ak. 



Also, for every v e V, we write: 

F. ■- 



\v,w)eE {F„ o TTvw] , 



F,. : a" Ak. 



We used the following notation: for any function ^ : — > A^ and k : {1, . . . ,«) |1, . . . ,«) we write 
g o n : Q." ^ Ak for the function {g o K){a)) := g(a>;r(i)> ^^^(2)^ ■ ■ ■ > '^;r(«))- As usual, we denote F^^, = 
(FwA, Fn.,2, F,,^k) where each F^.,; has range [0, 1] and Fy,j = 1. Similarly, Fy = (F^j , F„^2, Fv,k) 
and I^^Lj Fyj = 1. Now we are ready to define the clustering problem instance. 



Clustering instance: The goal is to find F : W xQ." 
l,ev [OBJ(F„)] 



Ajt so as to maximize: 

k k 



max 

F■.Wy.O."^^k 



max 

F:Wxn"^Aj, 



i=\ j=\ o-.\a\=\ 



(39) 



Completeness 

We will show that if the Unique Games instance has an almost satisfying labeling, then the objective value 
of the clustering problem is at least R{B)^ - o(l). So, let p : V U ^ {1, . . . , «} be the labeling, such that 
for at least 1 - £ fraction of the vertices v eV (call such v good) we have 

^vw(p(w)) = p(v) V (v, w) e E. 

Define F : W x ^ Aj: as follows: for every w eW, F^ : Q."^ ^ A^ equals the dictatorship corresponding 
to p{w) € {\, . . . ,n}, i.e., 

p^^ ._ jdict,p(w) 

Lemma 5.7 (Q). For a good v e V we have F^ = fictAv)_ 

Thus the contribution of v in ^ is OBJ(/^"^''^('')) > R{Bf - e as observed in Equation (|34]l. Since 1 - e 
fraction of v e V are good, ^ is at least (1 - e) • {R{Bf - s) = R{Bf - o(l). 



Soundness 

Suppose for the sake of contradiction that the value of (l39l ) is at least C{B) + 2e. As in fVl, it can be proved 
that the Unique Games instance must have a labeling that satisfies at least a constant fraction of its edges, the 
constant depending on the parameter t used in Theorem 15.41 This is a contradiction, provided the soundness 
of the Unique Games instance is chosen to be even lower to begin with. The proof is the same as in [7|, by 
replacing the C{k) therein by C{B) (|7 1 focused on the case when B is the kxk identity matrix. The constant 
C{k) therein is same as our constant C{B) when B is the kxk identity matrix). 
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6 A concrete example 



In this section we will use our results to evaluate the UGC hardness threshold of the problem of computing 









'\ 





0^ 


^ 


Clust 


A 







1 













.0 










(40) 



where A € M„(IR.) is centered, symmetric and positive semidefinite and c € (0, oo) is a parameter. The case 
c = 1, corresponding to B = It, (the 3x3 identity matrix) was evaluated in Q, where it was shown that the 
UGC hardness threshold in this case equals 

For general c > the optimization problem in ( |40l ) corresponds to the following question: given n 
random variables Xi , . . . , X„ the goal is to partition them into three sets S 1,82, S3 c {I, . . .,n} such that 



2 E [XiXj] + 2 E [XiXj] + c J] E [XiXj] 

iJeSi ',jsS2 ',jsS3 



(41) 



is maximized. Thus we wish to cluster the variables into three clusters so as to maximize the intra-cluster 
correlations, while the parameter c allows us to tune the relative importance of one of the clusters. We stress 
that we do not claim that this optimization problem is of particular intrinsic importance. We chose it as 
a way to concretely demonstrate our results for the simplest possible perturbation of the case of B = It,. 
We remark that it is also possible to explicitly solve the case of general 3x3 diagonal matrices B, i.e., the 
case of a general weighting of the clusters in (|4TI ). The formula for the UGC hardness threshold for general 
3x3 diagonal matrices turns out to be quite complicated, so we chose to deal only with (l40l) as a simple 
example for the sake of illustration. Note that for 3 x 3 matrices the characterization of C{B) in terms of 
planar conical partitions is particularly simple, and allows for explicit computations of the UGC hardness 
threshold in additional cases. 

n 0^ 



Denote B := 




10 



= «v,, v,»? . , , where vi = (1, 0, 0), V2 = (0, 1,0), V3 - (0, 0, ^/c) € The side 



lengths of the triangle whose vertices are vi,V2,V3 are j^i = Vl + c, ^2 - Vl + c, ^3 = V2|. Note that this 
is an acute triangle, so its smallest bounding circle coincides with its circumcircle, and therefore its radius 
is given by Q : 



pi pi pi 



(1 + cf 

{ii +€2 + +fi+ i^m -h + i^m + ii-h) 2+ Ac 



(42) 



We shall now compute C{B). By Lemma 
consists of disjoint cones of angles a\,a2,aT, 
shows that for 7 e {1, 2, 3) we have: 



the partition {Ai, A2, A3) of at which C{B) is attained 
[0, 27r] where a\ + a2 + aj, = In. A direct computation 



J 



xdy2{x) 



In' 



sm 



Hence 



1 

C{B) - — max | sm' 

In Q'l,Q'2,ff3£[0,2n-] 
a I +Q'2+Q'3 =1n 



+ sm 



+ c sm' 



(43) 
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Assume for the moment that the maximum in (|43] ) is attained when ai, 02,03 € (0, 2k). Then using Lagrange 
multipliers we see that sinai = sina2 = c sin as. This implies in particular that either ai = a2 or (since 
a\,a2, 0^3 £ (0, 2n) and a\ + 02 + as = In) a\ + a2 = n. In the latter case as = n, and it follows from the 
Lagrange multiplier equations that sinai = sina2 = 0, which forces one of {ai,a2} to vanish, contrary to 
our assumption. Hence we know that ai = a2 '■= a. Then a^ = In - 2a, and since a^ € (0, 2n) we also 
know that a e (0, n). The Lagrange multiplier equations imply that sin a = c sin(27r - 2a) = -2c sin a cos a. 
Thus cos a = - j^, and in particular we see that necessarily c > j. It follows that 

. 9 /a\ 1 - cos a 2c + 1 
sm-^ \ — \- 



,21 2 4c 

and 



9 / Q'3 \ 9 7 1 

sin — ] = sin (n - a) = 1 - cos a = I -. 

I 2 / ^ ^ 4c2 



Hence in this case: 



• 2/ai\ ■ 2(<^2\ ■ 2((^3\ ^2c+l 4c2 - 1 (2c + l)2 

sm^ (y ) - (y ) - ^ (y ) - 2— ^ c— = m 



It remains to deal with the boundary case {a\,a2,a'i\ n {0,2n] + 0, which as we have seen above is 

2- 



where the maximum in (|43] ) is necessarily attained if c < If one of {a\,a2, aa) equals 2n then the others 



must vanish, in which case sin^ + sin^ + c sin^ - 0. If one of {a\,a2, a^S vanishes then in order 
to maximize sin^ ^ y) + (^) + ^ (^) the other two must equal n, in which case the maximum value 



of this quantity is max{2, 1 + c). Since max{2, 1 + c) never exceeds the quantity ' from (l44l ) it follows 
that the maximum of sin^ + sin^ + c sin^ over {ai + a2 + 0-3 = 2;r A a\, a2, Q'3 € [0, 2k] ) equals 
^^^-^ when c > A and equals 2 when c < i. We therefore proved that 



4c 



(2c+ir 



if c> i 



C(S)= J 8- - 2' (45) 

U If ^ - 1- 

By combining (l42l) with ( |45l) we conclude that the UGC hardness threshold for computing ( |40b is: 

= J (l+2c)3 2' (A(^^ 

V 2+4c c 2 ■ 

Remark 6.1. An inspection of the above argument, in combination with our algorithm that was presented in 
Section m shows that the phase transition in (l46l ) at c = ^ corresponds to a qualitative change in the optimal 
algorithm: after shifting the vectors |vi, . . . , Vk] so that w{B) = and renormalizing by R{B), for c > ^ the 
algorithm projects the points obtained from the SDP to M? and classifies them according to a partition of 
into three cones of positive measure, while for c < ^ the partitioning is into two half -planes and the third set 
(the one weighted by c) is empty. 
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